Categorization

Custom Column Categorizations are used to add logical groupings of data based on either the data, query results or user selections. By creating a logical categorization of data in hierarchy column (metric or non-metric), the engine effectively adds a new virtual column to the data model that represents a new way to slice, dice and view query results. These columns generally use mathematical or machine learning algorithms to generate the classifications based on the context of the current query (when they are invoked in Discover) - so they are often "contextually driven". Different column data types dictate which categorization options are available and "Geo Boundaries" categorization is triggered by user interactions in bubble maps only.

Note: Custom Columns allow the user to add granular logic to an existing model. This functionality is not available on MS OLAP, Tabular and SAP BW cubes and models since this is not possible on a predefined model framework

Custom Column Categorization Types

Context Sensitive

Context sensitive categorizations use the existing query values to determine the categories. Once generated, these then remain fixed regardless of subsequent changes to the query.

Outlier

Outliers mathematically resolves which data points in the chosen measure of the query are statistical outliers and flags all points as "in" and "out". The logic uses simple standard deviation to determine the "ins" and "outs" and creates a new custom column that can be re-used in different parts of the query.

The outlier logic is only available on values or measures. It cannot be applied to hierarchies. As described, the calculation is based on the existing data points in the current query. Once determined, the outlier flags remain static regardless of any subsequent changes to the query.

  • The more advanced outlier options can be found in advanced query tools in the query ribbon, which also includes a more in-depth explanation of the outlier logic used.
  • Outlier logic is also available in Model for the greatest access to depth and function.

Bin by Rank

Bin by Rank takes the data points for a metric in a query and bands them evenly into each band based on its RANK POSITION in the list. This function is often called "percentile ranking", where each item is placed into a band based on its value order. The user specifies the number of bins or bands from the menu and the engine simply orders the numeric list and "chops" it up accordingly. By definition, each band will generally have the same number of elements, since the allocation to each band is based on order.

The new custom column that can be re-used in different parts of the query. This binning logic is only available on values or measures. It cannot be applied to hierarchies. As described, the calculation is based on the existing data points in the current query. Once determined, the bin definitions remain static regardless of any subsequent changes to the query.

Bin by Value

Bin by Value creates bins by evenly dividing the range of values found in the query numerically across the number of bands. The each data point is assigned to each band to which it belongs. The user specifies the number of bins or bands from the menu and the engine finds the lowest and highest values in the list, then produces even numeric intervals to create the corresponding to the bands. Since the values are determined to create even band definitions, its very possible that the number of elements per band or bin will be uneven.

The new custom column that can be re-used in different parts of the query. This binning logic is only available on values or measures. It cannot be applied to hierarchies. As described, the calculation is based on the existing data points in the current query. Once determined, the bin definitions remain static regardless of any subsequent changes to the query.

Static

Static categorizations are not based on query values, but rather on mechanisms for selecting elements and adding them to a category. These remain fixed regardless of subsequent changes to the hierarchies.

Geo Boundaries

Geo Boundary categorization creates a new custom column of those map elements selected on a map using the geo-boundary lasso tool, demarcating those inside or outside the boundary. The boundary lasso is user driven from the context menu on a bubble map, and allows the user to draw a circle (with an appropriate circumference) to pick all points within a distance of a center point.

The new custom column that can be re-used in different parts of the query. This geo-boundary logic is only available on map-based hierarchical elements. It cannot be applied to normal hierarchies or measures. Once determined, the bin definitions remain static regardless of any subsequent changes to the query.

Click here for more details on how to use the geo-boundary tools.

Bin by Category

Binning by category creates a new custom column by simply banding the elements in a hierarchy using their alphanumeric captions. This is a convenient way to break down extremely long lists of hierarchical elements into a "phone book" style grouping, to make analysis simpler and easier.

The new custom column that can be re-used in different parts of the query. This binning is only available on hierarchical elements. It cannot be applied to measures. Once determined, the bin definitions remain static regardless of any subsequent changes to the query.

Granular

Bin by Size

Binning by Size creates a new custom column by banding the atomic, granular row values in the underlying table of a metric. This is a convenient way to group transactional items in size BEFORE they are aggregated as part of queries and reports. And unlike the other categorization bins, the technique is to define the size of the bin rather than the number of bins to create. This reverses the creation process, allowing the engine to determine the right number of bins to generate given the data itself.

The new custom column that can be re-used in different parts of the query. This binning is only available on measures.